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Resume. In this paper we investigate the problem of learning an unknown bounded func- 
tion. We be emphasize special cases where it is possible to provide very simple (in terms 
of computation) estimates enjoying in addition the property of being universal : their con- 
struction does not depend on a priori knowledge on regularity conditions on the unknown 
object and still they have almost optimal properties for a whole bunch of functions spaces. 
These estimates are constructed using a thresholding schema, which has proven in the last 
decade in statistics to have very good properties for recovering signals with inhomogeneous 
smoothness but has not been extensively developed in Learning Theory. 

We will basically consider two particular situations. In the first case, we consider the 
RKHS situation. In this case, we produce a new algorithm and investigate its performances 
in I_2(Px)- The exponential rates of convergences are proved to be almost optimal, and the 
regularity assumptions are expressed in simple terms. 

The second case considers a more specified situation where the Xt's are one dimensional 
and the estimator is a wavelet thresholding estimate. The results are comparable in this 
setting to those obtained in the RKHS situation as concern the critical value and the expo- 
nential rates. The advantage here is that we are able to state the results in the I_2(px) norm 
and the regularity conditions are expressed in terms of standard Holder spaces. 

1. Introduction 

In this paper, we are interested in the problem of learning an unknown function defined 
on a set X which takes values in a set Y. We assume that X is a compact domain in M d 
and Y = [— M/2, M/2] is a finite interval in R. This problem, also called regression problem, 
has a long history in Statistics (many references can be found, for example, in the following 
books |Ibragimov and Has'minskh, 1981] , |Van de Geer, 2001 and Gyorh et al., 2002 ). It 



has recently drawn much attention in the work of Cucker and Smale, 2002| and amplified 
upon in | Pog gio and Smale, 2003| . 

We will assume to observe an n sample Zi , . . . , Z n of Z = (X, Y). The distribution of Z in 
denoted by p. Our aim is to recover the function f p : 

f p (x) =E p [Y|X = x]. 

We shall have as our goal to obtain estimations to f p with the error measured in the 
1-2(X, px) norm, or I^X, Px) where P is the empirical measure calculated on the X-l's. 
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Given any r\ > 0, if f is an estimator of f p (i.e. a measurable function of Zi , . . . , Z n , taking 
its values in the set, say, of bounded functions), 

p®-{z : ||f-f p || >ri} (1) 
l 



measures the confidence we have that the estimator f is accurate to tolerance r|. 

Contrary to Statistics, where people are mainly concerned with evaluation of moments of 
|| f — f p || (except rare examples, see |Korostelev, 2003| or |Korostelev and Spokoiny, 1996| ...) 
Learning Theory focuses on investigating the decay of as n — > oo and r\ increases. 

Another difference with the Statistics point of view is that one mail goal in Learning 
Theory is to obtain results with almost no assumptions on the distribution p. However, it is 
known that it is not possible to have fast rates of convergence without assumptions and a 
large portion of Statistics and Learning Theory proceeds under the condition that f p is in a 
known set 0. Typical choices of are compact sets determined by some smoothness condition 
or by some prescribed rate of decay for a specific approximation process. Given our prior 
and the associated class M(0) of measures p, it has been defined in DeVore et al., 2 004 1, 
for each r\ > the accuracy confidence function 

AC n (0,f,Ti):= sup p® n {z:\\f p -t]\>r [ }. (2) 

peM(0) 

This quantity measures a uniform confidence (over the space M(0)) we have that the esti- 
mator f z is accurate to tolerance r\. 

Upper and lower bounds for AC have been proved in DeVore et al., 20 04]. In most exam- 



ples, there is a critical r| = r\[n, 0) after which (J2J) decreases exponentially. This critical value 
r|(0,n) is essential since it yields, as a consequence bounds of type e m (0, f) < Cr|(0,n) q 
which have been extensively studied in statistics, for 

e m (0,f) = sup E p «n||f z -f p || q (3) 

peM(O) 

To evaluate lower bounds for the function AC m (0, f z ,r|), |DeVore et al., 2004| considered : 

AC n (0,Ti):=inf sup p^{||f P - fl| > ri) 

f peM(O) 

and the following result has been established : 

AC n (0,T!)>C{ 
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1, 11 < Tin, 

where r| n is defined by the relation : lnN(0,r| n ) ~ c 2 n(r| n ) 2 . N(0,r| n ) is the 'tight entropy' 
defined by : 

N(0,r|) := sup{N : 3 f ,fi, ...f N e 0, with c r| < ||fi - fj||L 2 ( Px ) < CiT|, Vi ^ j}. 

For instance, r\ n = n^^s+a for the Besov space Bq(Loo(IR d )) which corresponds to similar 
results proved in statistics (actually with more restricted assumptions on the set of proba- 
bilities p) : 

inf sup E||f p — f]| dx > cnT^+d . 

f p6M'{B i q {L 00 {IR d ))) 

See, for instance jlbragimov and Has'minskh, 1981] , |Stone, 1982 , |Nemirovskiy, 1985| for a 



slightly more restricted context than Besov spaces, and |Donoho et al., 1995| .!~ 

Concerning upper bounds for AC n (0,T|), many reverse properties have been established :see 
for instance |Yang and Barron, 1999] in statistical context, Cucker and Smale, 2002| , |DeVore et al., 20 04], 



|Konyagyn and Temlyakov, 2004| ^Tn learning theory. These upper bounds are generally proved 
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using particular estimation methods more often based on empirical mean square minimiza- 
tion. 

n 

f = Argmm^JYt - f (X,)) 2 , f e H n } 

i=1 

These very nice estimation rules raise nevertheless two important problems : First, they 
generally require heavy computation times. The second serious problem lies in the fact that 
their construction (the choice of H n ) is most of the time highly depending on : There 
also exist universal estimates (see |Temlyakov , 2005 1), however these rules are up to now 
prohibitive in terms of computation time. 

Our aim in this paper will be to emphasize special constructions and cases where it is 
possible to provide very simple (in terms of computation) estimates enjoying in addition the 
property of being universal : their construction does not depend on a particular and still 
they have almost optimal properties for a whole bunch of spaces 0. These estimates are 
constructed using a thresholding schema, which has proven in the last decade in statistics 
to have very good properties for recovering signals with inhomogeneous smoothness. 

In this paper, we will basically consider two particular situations. In the first case, we 
consider the RKHS situation. In this case, we produce a new algorithm and investigate its 
performances in L.2(p\). The exponential rates of convergences are good : the critical value 
r\ n is the one predicted by DeVore et al., 2004 , and the exponential rates are comparable to 



those recently obtained by Smale a nd Zhou, 2005], although the loss is not the same (I-2(px 



in SZ), and the regularity assumptions are somewhat different : in SZ, regularity assumptions 
are expressed in terms of RKHS spaces. These assumptions may seem more intrinsic. However 
it is difficult to figure out exactly what they mean since they are depending on the unknown 
measure px- Our conditions are also depending on the kernel, but easy to figure out. 

The second case considers a more specified situation where the Xt's are one dimensional and 
the estimator is a wavelet thresholding estimate. The results are comparable in this setting 
to those obtained in the RKHS situation as concern the critical value and the exponential 
rates. The advantage here is that we are able to state the results in the I-2(px) norm and 
the regularity conditions are expressed in terms of standard Holder spaces. 



2. Least squares and thresholding procedures 

In this short section, we will consider the construction of our thresholding estimates. To 
make easier their understanding and motivate their consideration, we give here a connection 
with general least square estimates. However this construction will not be used in the sequel 
and can be skipped by a hurried reader which can go directly to the next section. 

Empirical mean square minimization consists in considering 

n 

f = Argmml^JYi - f [X,)) 2 , f e H n } 

for a specified set H n . Let us look at particular cases of ~H. n leading to especially computable 
forms of f. Let us suppose that we have a collection of functions [ey)\ verifying the following 
property : 

1 n 

(P) : (e k ) : - V e k (X i )e l (X t ) = 6 kl 



(i.e. (eic) is an orthonormal system for the empirical measure P on the X(s.) 

if S(x) is the Dirac measure at the point x. 

Now, associated to this collection of functions, let us consider the following particular 
spaces : 

N N 

Eff = {f = £ ^ H n 2) = {f = Y- ^ Y- ^ ^ K} 

i=1 i=1 

N 

Ht 3) ={f = ^« i e i , #{|a!|^0}<K} 

i=1 

If we now introduce the 3 following estimations of these coefficients : 

n 



&k = - Y_ eic(Xi)Yi, &™ = sigu(<x k )|<x k - A| 

n ^ — 



U i =1 

= a k i{|& k | > a} 

It is easy to prove that there exists A 1 (k) such that the following rules are empirical minimizers 
for the respective spaces In , i G {1 ,2,3} : 

N N 

f 1 =£ft k e fc> f 2 = Y*£ ) ek 

k=l k=1 
N 



k=l 



Y_ 4 2 'ek 



These three rules are common in the statistical litterature. f 1 is generally refered to as 
linear estimate, whereas, f 2 and P are known as (respectively soft and hard) thresholding 
estimates. 

Our aim in this paper is to study the behavior of these estimators, principally f 3 , in 
different situations. The main difficulty of this paradigm obviously lies in the question : How 
to choose the functions (e k ) such that condition (P) is verified and suitably chosen tuning 
constants N , A ? 

This first problem is difficult to solve, if not impossible, and in the sequel, we will not 
assume that property (P) is verified, but we are going to consider situations where this 
property can be considered as 'almost true'. 

3. RKHS SITUATION 
3.1. Assumptions, estimation rules and regularity conditions. 

4 



3.1.1. Assumptions on the kernel. Let us take the case of a symmetric kernel K(-,-) (we 
do not explicitely need the fact that K is a Mercer kernel). We assume that the kernel K 
is uniformly bounded by an absolute constant k. Our fundamental assumption will be the 
following : 

(A) : There exists a set of p determinist points in M d 

{x-i , . . . x p } 

(p will tend to infinity with n) such that the following p xp matrix M. np whose entries are, 
(^^£=1 K(xi, Xi)K(Xi, Xk))^ is almost diagonal, in the sense that : There exists < S < 1 
such that : 

Vxel p , ||x|| 2 2 (1 -b) 2 <x t M np x< ||x|| 2 2 (1 +5) 2 (4) 



II^HlooCl - & ) < II MnpXlllcc (5) 

We do not assume anything about 6 but this quantity will enter into the performances 
results of the procedure. 5 will be desired to be as small as possible. Notice that in general, 
such an assumption reflects the concentration properties of the kernel, and is quite easy to 
verify in practical situations where 5 can be computed empirically. In particular, we allow 
in the sequel 5 to be a random quantity depending on the observations. 

3.1.2. Estimation rule. Let us consider the following estimation rule : We will denote by Y 
the vector with coordinates Y i; e t = Y t — f p (Xt), and e will be the vector with coordinates 
£i. Let us denote by f x the n dimensional vector which entries are f p (Xt), and K the p x n 
matrix which entries K(xv, Xi) (so ^KK* = M np ), and introduce : 

tn = i^^, A n = T v / t^, (6) 
n 

z = (z 1 ,...,Zp) t = (KK t )- 1 KY, (7) 

z = {zi,...,z p )\ zi = zJllzvl > A n ) (8) 

T will be chosen so that T > yj M 2 + j V 4, and finally, our estimate will be : 

f=XZiK(x l> -). (9) 

As is easily seen, f takes its inspiration into is and it is worthwhile to notice that its con- 
struction do not depend on any regularity parameter. 

3.1.3. Regularity conditions. We will assume the following sparsity conditions on the function 

V 

Let us take 
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For any n, there exists cxi , . . . , <x p , such that 



||f p -^cx I K(x 1> )|| 00 < cp- 3/2 (10) 

VA > 0, card{|oti| > A} < cA^tth (11) 

These conditions reflect approximation properties for the function f p by linear combina- 
tions of vectors in the RKHS (when K is a Mercer kernel). These properties are quantified 
by conditions on the coefficients <Xi's, which are standard in various situations (Fourier, 
wavelet coefficients...). As discussed in |Kerkyacharian and Picar d, 2000 condition (|10|) re- 
flects a 'minimal compacity condition' which do not interfere in the entropy calculations (for 
instance) neither in the minimax rates of convergence. Condition (fTTj) does drive the rates. 
It is given here with a Lorentz type constraint on the ctt's. These conditions are obviously 
implied by l T conditions (for appropriate r) which then looks very much like Besov conditions. 

We will measure the error by the following norm (empirical norm) : 

H9lli=^L9(X0 2 (12) 

i=l 

|_xj denotes the integer part of x. Our result is the following : 
Theorem 1. Let us take 



For any s > 1 /2, we define, 



TIt 



n , - 

l+2s 



logn 

Under the conditions above, there exists a constant D, such that 

A i p-Y[np-' r| 2 VlogrL] ^ > t-j 

sup P ^{\\f p -f\\ p >0-b)-^}<J{ e - ^r£ n ' ( 13 ) 

Remark 1. As mentioned in the introduction these results prove that the behavior of this 
estimator is optimal in terms of the critical value r\ n as predicted in |DeVore et al., 2004| . 
In terms of exponential rates, they are suboptimal because of the term p i_l . However it is 
worthwhile to notice that these rates still are good : they are comparable to those obtained 
by |Smale and Zhou, 2005| , although the loss is not the same and the regularity assumptions 
are somewhat different. In addition, we observe that if not entirely opimal, these rates are 
always better than n~ c . 

Finally, it is important to notice the following technical facts which will be crucial in the 
sequel : because s > 1 /2, r| n > p~ 1 . Condition ( TO)) can obviously always be replaced by : 

VA > 0, card{|cci| > A} < cA~tth Ap) (14) 
6 



3.2. Proof of the theorem. First, let us remark that 



|fp — flip < ||f~ y <XlK(Xt, j Hop + || y CX 1 K(X 1 , j -f 
1=1 1=1 
P 

< cp- 3/2 +\\Y_[oii-zi)Ux h )\\p 



1-1 



< cp~ 3/2 + [(a-z)M np (a-z)]i 

p 

< CTi n +(1 + 6)[}ja 1 -z\) 2 ]? 

i=i 

Notice that the first line used hypothesis (|TU|) . and the last one (jlj). 
p p 

X.(a t - z\) 2 < Jjai - z0 2 I{|zi| > A n }[I{| ai | > A n /2} + I{| ai | < A n /2}] 
i=i i=i 
p 

+ ^[aJ 2 I{|zi| < A n }[I{| ai | > 2A n } + I{\otx\ < 2A n }] 
i=i 

:= BB + BS + SB + SS 

Let us study the term SS. First we remark that because of condition on f p , we know 
that 



card{|at| > A n } < cA^ 



-2 
+ 2s 

n 



and it is not difficult to prove that (fTTj) is equivalent to the following characterization (the 
result is standard in Lorenz spaces and in any case can be found in Cohen et al., 2 001 



V A > 0,Y_ a 2 I{|ail < A} < cArai (15) 
i 

Hence, using ()15|) : 



^2 



SS < cK +2s = cfVtnTlra; = cTtttit, 

Let us now investigate the term SB : We observe that I{|zx| < A n }II{|<XiJ > 2A n } < 1I{| oc^ — Zi\ > 
|ai|/2}I{|ai| > 2A n }, hence : 

p 

SB < ^[aJ 2 I{|zi - oli\ > N/21IIMI > 2A n } 
i=i 

p 

< 4^(a l -z l ) 2 I{|a l | >2A n } 



i-i 



In the same way : 

p 

BB = ^Jat-zJ 2 !^! > A n ; |cx t | > A n /2} 



p 



< ^(od-ztfliloul > A n /2} 



So BB and SB can be treated in the same way, since 
p p 
Y_(oci- zi) 2 I{|ail > 2A n } <^{oc x - z l ) 2 I{|a l | > A n /2} 
i=i i=i 

p 

BB + SB < 5 Jjotx - z0 2 I{| ai | > A n /2} 
i=i 

Let 

p* = card{| ai | > A n /2} < c(A n /2)mr (16) 

3.2.1. Study 0/2^1=1 z i) 2 I{l a il > ^n/2}. Let us denote by fx the vector with coordinates 
[fx]i = f(Xi)= LLTaiKfxx.XO : 

fx = KV 

Let us recall that fx is the n dimensional vector which entries are f(Xi). and by hypothesis 
(JUIJ), |f (XO - f(X t )| < cp- 3/2 So that, 

a = (KK t ) _1 Kf x , 
z = (KK t )- 1 KY= (KK t )- 1 K[f x + e], 
ct-z = (KK t )" 1 K£ + (KK'l^Ktfx-fxl 

^From this we deduce, 

||a-z|| l2 < IKKK^^Kelli.frt + IKKK^^Ktfx-fxllli^p) 
But, since (KK 1 ) -1 = ±.M nv ~\ and using (gj), 



IKKK^^Ktfx-fxllkcp) = ^||M n ;K[f x -fx]||i 2 (p) 

< ( 1 _6)-i||lK[fx-f x ]||i 2 (p ) 

< n-sr^iifx-fxiiooW 

< (1 -6) _1 c-K < c(1 -6)" 1 KTl n (17) 

p 

^From the calculations above and (JH), we deduce, 

p p 
^Jcxx - z0 2 I{| ai | > A n /2} < ^((KK t ) _1 K£) 2 I{|a l | > A n /2}+c(1 - [kt| J 2 (18) 
1=1 1=1 



Let us now recall the following inequality due to Pinelis |Pinelis, 199 4 1, assuming that the 
£,i's are Hilbert space valued, independent random variables, such that ||£,i — E(£,0|| < M 
andEH^-E^OH 2 ^ a 2 (£,), 

Prob ( || — Y~ [£,t — E£,t] || > A) < 2 exp { } (19) 

n tT l 2(AM/3 + cx 2 (£,)) J 

Now as cr 2 (£,) < M 2 , replacing 0" 2 (£J in the RHS, we get : 

Prob(||— Y~[£,i — E£,i]|| > A) < 2exp { ^ } (20) 

n< ~7 L 2(AM/3 + M 2 ) J 

As only A < M is significant, since Prob (|| \ }_Xi l£t ~ E£,J || > A) = 0, for A > M, 

Prob(||— V~ [£,i — E£,t]|| > A) < 2exp (21) 

8M 2 

Let us now take £,-t G M p : 

(^^(Kfx^XOeOx 

in such a way that, 

^ £,i = Ke 

i 

and the £,t are independent. It is easy to verify that E(£,0 = 0. 
Let us for all U e M p define the following Hilbertian norm : 
v v 
^JutKKVU) 2 ! {N > A n /2} = ^((M^U) 2 ! {| ai | > A n /2} 



2 
A 

1=1 1=1 



P 1 

TL * A 



Then, 

Y" ((KKVKe) 2 ! {| ai | > A n /2} = ||- V" £,i 
L — n z — 

1=1 i 

Now, we have using © 

p 

lieilli = £>^*0?I flail > *n/2> 
1=1 

< p*(sup(M n ;£, i )i) 2 

< p*(sup(K(x 1 ,X i ) £i ) 2 1 



7 — — (1 _ 6 )2 

a 1 ...» (Mk) 2 



< P l^iJ m 7T7<P 



(1-6)2-" (1-S) 2 



Now, using (J2T 



Prob(||l |> - EUf > < 2e*p 

So for a > suitably chosen, and taking account that r\ > r\ n 
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P 0n ( ^(ai-zi) 2 I{l^|>A n /2}>i^i) 



- (2cm) 2 

(1-6)2' 



< pnUCCKK^^KeJfEflaJ > A n /2}+ [ck(1 - SrVJ 2 > t^^^) 



1=1 

< p^i^KKT^em^ > A n /2} > -0^) < lexp-^nn 2 -^} 

Now, if we recall that r| n = (i2£Il)s/(i+2s). p -i = ^ An = and p * < 4 c (Tt n )ir^ A 

p, evaluation at the point rj = rin gives : 

2 6XP - { l n7]i ^4^ } = 2 GXP - { l bg n c(l)^M^K^ } - 

Hence 



p®n(V[l y- e-K^Xi)] 2 > ti 2 /2) < exp-C^p- 1 Vlogn] 
z — n z — 

l i 

3.2.2. Study of ( a i ~~ z i) 2 I{l z il > A n }I{|cxt| < A n /2}. It remains now to study the term : 
v v 
BS=£_(ot x - z l ) 2 I{|z l | > A n }I{| ai | < A n /2} <£_(oc x - z,) 2 !^ - oc x \ > A n /2} 

Using the previous result with p instead of p*, we get, 

( BS > (2aTl)2 ^ 



< p ^(£((KK t )- 1 Ke)?> 7 j^£ 5 ) 



i=i 

< lexp-^ 2 -^-,} 

We proceed as in the previous section, and obtain using (jJJJ) : 

|ai-zi| < ItfKK^^KeJJ + IKKK^^Ktfx-fxllL 

< l-fMnp^KeKI + fl -Sj-Vp- 3 / 2 
n 



< (1-6)- 1 ||lKe|| loo +(l-6)- 1 Vt;M K 



n 

So 

p^ n (3l G {1 , . . . ,p}, |ai - z t | > A n ) < p^O - S)- 1 sup |-(Ke) l | + (1 - S)" 1 k^ 2 > A T 

i n 
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1 



< y p ®n ((1 _ 6) -i|_ y K(x l( xo ei | + (i - b)-\ v - yi > TVu) 

z — n *• — 

i 

v^i^ . A i„ t T ..r lo g n M, 

But for n large enough 



n 

i=i 

< y- p^ ( |i y K( Xi)£ .| > ^g~^( ' k(— — — — ^ ' ' 

z — n z — 1—6 n 

i=i i 



Jf__^logn ll/4l > T_ 



v 1 -6 n J 7 - 2(1 — 6) 
and using Hoeffding inequality 

p®n ( |l J" K(xi,X0e t | > — -L Q < 2exp T " 1< * U 



So 



n^- w x '- 2(1-6) y^g^n 8(1-6) 2 k 2 M 2 

p 0n (3l G {1 , . . . , p}, |at - z x \ > A n ) < 2p exp - — ligf-L- < Cn- a 



8(1 -6) 2 k 2 M 2 



with a > if T is large enough. 
So : 



1—6 8 pM z K 2 

This yields the results. 

4. Wavelet results 
4.1. Assumptions and estimation rules. 

4.1.1. Assumptions on the model. In this section, we will concentrate on the case of dimen- 
sion 1 : the random variables X^'s are now taking their values in X = compact domain of R. 
This case can easily be generalized to the case where the measure px is a tensor product of 
measures px { , i = 1 , . . . , d. However the full generalization to dimension d is more involved 
and will not be discussed in this paper. In the case d = 1 , we define the distribution function 
G such that 

VteR, G(t) = p(X < t) g [0,1] 
and assume that it is a derivable function. We also define, 

Vx G [0,1], G _1 (%) =inf{t G K, G(t) > x}. 

Again, we will assume that f p has sparsity conditions which can be in this case directly 
expressed in terms of regularity conditions. More precisely, we will denote by A4{@ s ), the set 
of measures p verifying all the assumptions above with in addition the fact that f p (G _1 ) G 
B^ o (L oo ([0, 1]))(M) (the ball of radius M of the Besov space). Notice that as we will only 
consider the case where s > (in fact s > 1 /2) f p will always be bounded by M. 

Let us consider {ib^k, j > j + 1 , < k < 2'} a wavelet basis on [0,1] (at least continuously 
differentiable, with enough moment conditions; the length of of the support of ibj^ the will 
be supposed to be less than N2~'). We recall that : ihj,k = <PjTc denotes the scaling function. 
These assumptions are standard (see |Cohen et al.~ 993|). 



n 



Let us expand f in the wavelet basis : 

oo 

j==j ke 

and it is well known that for < y < oo, f(G _1 ) belongs to B^ ) (L oo ([0, 1])) iff (and we will 
take this as the B^ 3 (L oo ([0, 1]))— norm) : 

sup2 ,(y+ ^ sup |0 j(lc | =: ||f|| B Y < oo. 

j>j 0<k<2) 

In this section our loss will be measured in term of L2, with respect to the measure dpx : 

f(x) 2 dpx(x)]i 



4.1.2. Estimation Algorithm : Again, we put 



lOgn _ r— 

t n . — , A n — Ky t n , 

n 



define : 



n 



6nM = - Y I{Xi < X}, 
n z — 

i=l 

and let us introduce the ordered statistic : X(i) < . . . < X{ n ). Doing this, we introduce a new 
ordering on the indices {1 , . . . , n}. Keeping this ordering, we denote Y(i], . . . , Y( n ). Note that 
Y(i), . . . Y( n ) is generally not the ordered statistic of Yi , . . . Y n . 
The estimator is constructed in the following way : 

- Step 1 : Estimation of the wavelet coefficients : 

1 n , n 

05ic = - r Y*Me n (~)) = -Y_ Y ( i]^jic(x (l) ) 

n z — n n z — 

1=1 i=1 

- Step 2 : Thresholding 

IV = fljiclfl&ij > A n } 

- Step 3 : Reconstruction 

J 

Note that this algorithm is an adaptation of the standard wavelet algorithm introduced in 
Donoho and Johnstone, 1994 in the case of an equispaced design. It has been investigated 



in |Kerkyacn1ma^HmT^icard, 2004| , where the expectation properties of the L p (dx) losses 
have investigated (instead of here the deviation properties of the L^dpx). It proves to have 
very powerful properties. One of them is its remarkable simplicity in terms of computation. 
To illustrate this, we give here the main steps of the computation algorithm : 
Algorithm : 

(1) Sort the Xi 's, 

(2) Change the numbering in such a way that Xt has rank x, 

12 



(3) Calculate the highest level alpha- coefficients using the formula : 

1 n 

&j/ k = - V_ cpj/ k (i/n)Yi, (2 r = n) 

(4) Calculate the wavelet coefficients using the classical pyramidal algorithm 

(5) Perform a thresholding algorithm giving rise to coefficients, 

(6) Reconstruct the estimator, using again the standard backward pyramidal algorithm, 
obtaining 

I 

j=j 0<k<2> 

which is a function especially easy to draw. 
Our aim in this section is to prove the following theorem. 

Theorem 2. With the conditions above, Vs > \, 

Tin = [; ]'+ 2s , 

logn 

there exist positive constants y, T, D such that, 

snp pni|fp-t1|>p}<T{^ ' ^<r^' 
peM(© s ) 1 > "I ^ ut 1ti, 

as long as 

[ ]tth <2 J < [- p 

log TL log TL 

Remark 2. ^4s mentioned in the introduction these results are comparable to those obtained 
in the RKHS situation as concern the critical value and the exponential rates. The advantage 
here is that we are able to state the results in the L 2 (px) norm and the regularity conditions 
are expressed in terms of standard Holder spaces. We expressed the results in a slightly 
different way, leaving the choice of ], as an option. If we optimize oour results in ], we take 
1) = [j^j-] '+ 2 s which gives better rate results but fails in being adaptive. If we want our 

estimate to be universal (work for any s > 1 /2) we need to take 2) < ■ 

4.2. Proof of the theorem. Throughout the proof, the constant c will denote a constant 
which may vary from one line to the other, but may be explicitely calculated. For a sake of 
simplicity we will not make explicite the constants obtained in the proof (although it could 
be done easily) since we do not think that they are optimal in any sense. 

It will be essential in the sequel to notice that with the assumptions above, we have : 

||f||L 2 (X Px ) = ||f(G~ 1 )||i J ([0,1],dx). 

Since ||f|| Px = ||f (G _1 ) || dx , we have if 

j,lc 

13 



-f p || px = HffG- 1 ) -fpfG- 1 )!!^ 
j 

= H XX M^niG-')) ~ Y_ M>jk||dx 

J J 

< |lXXM^k(6 u (G- 1 ))-^ k ]|| dx +||^^[|3 jlc -(3 jk ]^ lc || dx 

j=j k j=j k 

+ ii Y- Y- ivmu 
i=>j+i k 



Hence 



_ J 
||f-fp|| 2 px < 3[||^^|3 jk [^k(6 n (G- 1 ))-^ k ]|| 2 dx + ^^[(3 jk -|3 jk ] 2 + ^(3f kJ 



j=2 k j=2 k j>J+1 k 

< (I) + (II) + (III) 



Iff p (G- 1 )eB^ o (L oo ([0,1]))(M),then 



III = ^ X Pfk < X 2 ' SU P Pfk < M 2 X 2 j 2-' (2s+1) < M 2 2~ 2Js < M 2 ti 2 
j>j+i k j>j+i k j>j+i 



1 -i 



if2J>U +2s =( V /^)TTH 

Let us now study the second term : 



I 

(II) < XZ [ £*- PjiJ 2 I{l^icl > AnlKlfrkl > A n /2} + I{||V| < A n /2}] 
J 

+ X Y_ttid 2 M\%d < An} [I{l(3 jk | > 2A n } + I{||3 jk | < 2A n }] 
H k 

:= BB + BS + SB + SS 

Let us study the term SS. First we remark that, as f p (G~ 1 ) G B^ o (L oo ([0, 1]))(M), then 
IPjkl < M2 H(s+ i ) , hence if we denote : 



2 U = t n +2s 
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h _ J 
SS < ^^[(3 jk ] 2 I{||3 jk |<2A n } + ^^[(3 jk ] 2 I{|(3 jk |<2A n } 

j=2 k )=Js k 

is J 

j=2 k j=j s k 

J 

< 22 js (2A n ) 2 + ^ 2' M 2 2- 2j (s+ ? ] 

< (8k 2 + 2M 2 )ti 2 

Let us now investigate the term SB : We observe that 

I{|g jlc | < A n }I{||3 jk | > 2A n } < I{|0 jk - |3 jk | > ||3 jk |/2}I{||3 jk | > 2A n } 

, hence : 

I 

SB < ^^[(3 jk ] 2 I{|^ k -(3 jk |>|(3 jk |/2}I{|(3 jk |>2A n } 

j=j k 
I 

< 4^}j^ k -|3 jk | 2 I{||3 jk |>2A n } 

j=j k 



So 



BB + SB < 5 Y_ Y. ~ M^IM ^ A -/ 2 ) = 5BB ' 
H k 

Now, we investigate the term BB'. 

If we recall that X(i) < . . . < X( n ). Doing this we introduce a new ordering on the indices 
{1 , . . . , n}, and that we keep this ordering, to denote Y (1) , . . . , Y (n ). We also introduce lit = 
G(Xi), i = 1 , . . . , n, as well as the associated U(i), . . . , U (n ). Notice that the U^'s are ordered 
(since G is increasing) and the Ut's are i.i.d. uniformly distributed. 



1 i 

^jk-|3 jk = -^Y (i) i|)^-) - (3 jk 

i=1 

= [l^f^G- 1 ^,])^-)- \^ P lG-')] + [-Ye { M-)] 

1 TV p -a TV 

= [-VfpfG-'tU^W-)- tMp(G _1 )] + [-r£*M-)] 

n ^ — r n n ^ — n 



i=1 

:= Aj k + B jk 



(22) 



Let us begin by the following lemma which proof is obvious (but which will be useful in the 
sequel : 
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Lemma 1. For any r > 1 , we have 



1 «H- i 2^+j"i 

y_ \^m-)\ t < t^'i- 1 ' +< — (23) 

i n n 



n — n n 



ustfi t t = N||op|| 00 and < = Nrl^'lUdl-iHU) 1 - 1 
Let us put : 



1 n 

^n(x) = - 5~ I{Ui < x}, A n := sup \t n {x) - x\ 
n ~T xe[0 > 1] 



and s = s A 1 , using (jzBj) for the third inequality. 
1 



i=l 

+ f\ XM \i p (G-\x)^(x)-i p (G-\-))^(-)\ 
z — - - - n n 



i=1 J 



{i-1)/n 

n 



< A^lfpfG-^lhoooo-^IW- 



n * — n 



^- f i/n 2 3 '/ 2 k k 4- N 

+ Y_ [2'/ 2 ||^|| 00 ||f p (G- 1 )||, 0000 n- s + ||^|| l0000 ||f p (G- 1 )|| 00 ]i{ x e [ _t_]} d x 

i=1 J(i-1)/n n ^ ^ 

73j/2 

< A^llfpfG-^Hfeooo^^ + Ti— } 

+ M Halloo IK P ( G- 1 ) |UooooTt-«2-i + N||-iH| l0000 ||f p (G- 1 )|| 00 — 
2i/2 , 

< dA^2- j/2 + C 2 + C 3 n- s 2-i (24) 

n 

where 

Ci=T!+^, C 2 = N||-i|; / || 00) C 3 = N||iJ J || 00 ||f p (G- 1 )||3oooo 
The last line uses the fact that for j < J, 2 2 i < n. We can then state the following lemma : 

Lemma 2. For J sitc/i t/iat tA +2s < 2* < t n , we /iave : 

J 

p^ n (^ Y- A * - ^ - exp - Cn2 ^ 2 V lo S n > ( 25 ) 

H k 

for allr\ > Dr\ n , where C = 2(2CfN)~5~ 

Proof of the lemma : 
We observe that 

> v- r 2>/ 2 l2 2 2 ' 1 2 

5" 5" [ — ] < c— < C- « t£. 

j=j k 
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I 

and Y_ Y. n_2 ' 2H < J n ~ 2s « 
H k 

£ A^2-' < JA? (26) 

j=j k 

Hence, for r\ > Dr| n , and n large enough, 

I 

j=j k 

< K exp -cu[ri,r 1/2 ]h{ri 2 < 2C 2 J} 

The last line uses the following Dvoreski, Kiefer and Wolfovitz bound (see for instance the 
review on the subject in Devroye Lugosi section 12.) : For any A > 0, there exists a universal 
constant K, such that : 

F( A n > A) < K exp -2nA 2 (27) 

(and noticing that A n < 1 ) 

Now, for s > 1, n[r|]5"T2F = nr| 2 T~ 1/2 > nr| 2 2~ J V logn. 
Identically, for 1 /2 < s < 1 and r\ > Dr| n , 

> nri 2 2- I 2- 2sjs( s-'"2' s J^ 

> rn] 2 2- ] 2 u(2s -^ > m] 2 2- ] V logn 

This ends up the proof of the lemma. □ 
Let us now investigate the term corresponding to the Bj k 's. We have the following lemma : 

-i _ 1/2 

Lemma 3. For J such that t n +2s < 2) < t n , there exists a constant c such that : 
J 

p^f^^B^Ifjkl > A n /2} > ti 2 ) < c exp -cti2- j t 1 2 V logn, (28) 

H k 
for all] >r[> Dr\ n 
Proof of the lemma : 

Let us first remark that since fpfG" 1 ) G B^L^O, 1]))(M), then |(3 jk | < M2 H(s+ ^ and 
then, if k > 2M, |(3jjJ > A n /2 implies j < j s , hence : 

Y_ 1 L B fA\^\>K/2} < 

j=j k j=j k 

js 

< Y 2 ] sup B 2 k < 22 js sup B 2 k 

k jk 

1=2 

We will investigate separately the cases r\ < 1 , and r\ > 1 . Let us begin with the fist case : 

17 



I 

p «n ( ^-^- B 2 J{|(3 . k| > An/2} > T1 2 ) < p®-(2^+ 1 S upBf k >Ti ; 
M k jk 



i=j k 1=1 n 

- 4t4- 2(nC 3 + nnM||*|U2!l-i. 1/2/3) 1 



* ^ t ' ^c i+ 'Si./ } ) ' (29) 

In the last line we used Bernstein inequality (cf Bernstein |Bernstein , 1946]), since the vari- 
ables ibjk^)^ are a sequence of independent bounded random variables (by M-U"^!!^*), 
with zero mean and 



MY W-)£i] 2 < C 3 n 
z — n 



(M 2 (t 2 + t^) := C 3 using 
Hence we obtain : 



J i 

P^CX. X. Bfic^lPjicl > W2} > T] 2 ) < 2exp{-cnT 1 2 2- js +-logn} (30) 
H k 

with c = 4(C 3 + MHiblloo/S) -1 since r\ < 1. As T| > DTi n , it is easy to see that for D large 
enough, cnr| 2 2~' s > 21ogn. Hence in this case, we get the bound : exp — cn2^r\ 2 V logn 

Let us now study the case where r| > 1 , we'll use Mac Diarmid's inequality (see |Diarm id, 1989 
we have the following lemma : 

Lemma 4. For J such that tn +2s < 2* < t n , we have : 
J 

p 8n (X L B fk I {IPjkl > A n /2} > ri 2 ) < exp -Cr^ 2 V logn, (31) 
H k 

/or oWti>1, andC = ^T, B 2 = 2M 2 N 2 ||iL>|| 00 . 

Proof of the lemma : We have : 
I 

p ^n ( J- J" B 2 kI{ |p. k | > An/2} > T] 2 ) < p ®n (F(£i| . . f Eft) > ^ 

J=j k 



with : 

F(e 1 ,...,e l ,...,e n ) = Y_ Y. "i^L^"^ 
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1 n 

n z ^ — n 

i=j k i=1 



|AFi| = |F(ei,...,ei,...,e n )-F(e 1 ,...,e{,...,e n )| 

TV \ TV TV TV 



j=j k \ i=1 i=1 

< 2M 2 £ Z iZNM^IhM^ 

- K < In IT-IT 

< 2M 2 N 2 ^|H| 2 ^2^ 

< 2M 2 N 2 ||^||^I =: B 2 2£ 



On the other hand, 



1 TL 

E p ®nF(e 1 ,...,e n ) < j~ Y_ —.[Y xpjk( — )£il 2 

z — L — n/ z — n 

j=2 k 1=1 



TV ^ — n 

j=] k x=l 
)=2 k 

< 2M 2 C 3 — ^cti 2 
n 

Hence, for r| > Dr| n , 
J 

P® n lY_ Z > A n /2} > t] 2 ) < p^lFfe,, . . , £ n ) - E p «nF( £l) . . . , e n )| > ti 2 /2) 

j=j k 

- exp ^ ex P - nC 7i 

Now, for rj > 1, we obviously have Cn^ > Cn2~ J r| 2 V logn, which proves the result of 
the lemma. □ 

Notice also that, using exactly the same proof, we have also the following result, which 
will be used later : 

Lemma 5. For J such that tn +ls < 2 1 < t n , we have : 

J 

P^ n (Z T- B ?k - A ^ - exp - CnA V J 2 V log n, (32) 

j=j k 

for all A 2 > 2M 2 dtn /2 ; C = ^ B 2 = 2M 2 N 2 | 
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This achieves bounding the term (BB). We now proceed to bound the term (BS) 



J J 
^^J0 jk - |3 jk ) 2 I{||3 jk | < A n /2}I{|£ jk - |3 jk | > A n /2} < ^}j£ jk - |3 jk ) 2 I{|0 3 - k - |3 jk | > A n /2} 

< 2 J+1 sup{(£ jk - (3 jk ) 2 ; |0 jlc - (3 jk | > A n /2} 

Jlc 

Hence 

J 

P® n lY- M^lfe - Pjicl > An/2} > T! 2 }) < 2 I+1 p 0n (ftk - PjiJ > ti2- j / 2 /2 V A^/2) 

j=j k 

Now, using and (J25|) . we get 

2 J+1 P m^k- (3,-id > T!2- I/2 /2 V A n /2) < 2 I+1 p 0n (C 1 A n 2- J/2 > ^-'^ V A n /8) 

+ 2 J+1 p^ n (|B jk |>T 1 4- I/2 /2VA n /8) 



< 2 J+1 Kexp-cn(^)h{r| < 4C 



4 



J+w -n (T1 2 2-Vl6VA 2 /64) 

U 2C 3 + (Ti/4VA n /8)2^)M||^|| 0O 1 

The first term may be bounded as in Lemma EH the second one may be bounded by : 
exp — c[n2 J r| 2 V logn], with c = (64C 3 ) _1 if T| < 1. 
For r| > 1 , we have : 

I J 
^}j0 jk -M 2 I{|[3 jk |<A n /2} < 2 J+1 supA 2 k + ^^B 2 k 

j=j k ' k j=] k 

Hence, 

I 

pn^^(^k-M 2 I{IM<An/2}>T! 2 ) < p^^supA 2 ^^) 
H k * 

J 

+ pn^^Bf^t! 2 /^) 

j=2 k 

The first term, treated as above, gives the same bound since in this case the condition r| < 1 
was not necessary For the second term, we use the lemma 

This achieves the proof for the term (II), which can be summarised in the following propo- 
sition. 

Proposition 1. Vs > \ 

sup pnr )ji3jk-M 2 >t 2 }<c{ i ' l^y 



H k 

tl^n]^ < 21 < A" C = ( 64C 3)^ A (2B 2 )- 1 A 2(^)i 
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It remains, now to study the term (I). 
We have : 



J I 

II Y_T_ MMGnfG- 1 )) -*Mk < II XL "V- MOWGJG- 1 )) -^ jk ]]|| dx 

i 

+ IlLLllSjklOWGJG- 1 ))-^]^. 

H k 
i 

< IlLLl^k-MOWGnlG- 1 ))-^]^. 
I 

+ lIlLLllS^IO^GnlG- 1 ))-^]^. 

j=j k 

since |(3 jk - |3 jk | < ||3 jk - |3 jk | + |(3 jk |. If Z = XilfVl^jk we observe that ||Z|| soooo = 
IKp(G _1 )|| SO ooo, so : 

I 

IlLLlMliMGjG-^-iMU = llzte^G-Vzild* 

H k 

< llfJG-^llsooooAi 



Hence, 



J 

P^fll H £ IPjklD^JlcffinfG" 1 )) -^g]||dx > Tl) < PnilfpfG-^IUooA* > Tl) 

5=2 k 

2 



< Kexp-crniTlfTi/llfpfG-^Hsocoo < 1} 

< K exp{— cnr| 2 2~ J V log n} 



As above (see the proof of lemmaEJ), with c = 2||f p (G ^Hfoooo, here. 
Concerning the stochastic term, using (|22|) we have : 



J J 
ULl^k-Pjkltl^GnlG- 1 ))-^]^. < || LLlA^ltl^^G- 1 ))-^]!!^ 

j=j k j=j k 

I 

+ IlLLlB^ltl^GJG- 1 ))-^]^, 

H k 
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Now, if Z' = ZJ^LtJA^W)^, using (J2U, and s > J, 
||Z'|| 1/20000 < sup{2'|A jk |} 

j<J,lc 

233/2 

< sup{Ci A^2 j/2 + C 2 + C 3 n-~ s } 

j<J,k TL 

?3J/2 

< dA^2 J / 2 +(02 + 03)-^ 

Let us investigate separately the two contributions : As above, 
I 

pnil LLfe- MIW^g- 1 )) - -ip jk ]]|| dx > n) < pmiz'ih/iooooAy 2 > n) 

H k 
Furthermore, 

p »n (A V2« 2 r/2 > tj/^CO) < expHnl-^l-'/^ffi-^Z-'/ 2 < 1} 

Now, as s > 1/2, we have, for r|2- J / 2 < 2d, n(ri2- J/2 ) f ^ > (2C 1 )T^n(ri2- J/2 ) 2 Vlogn, 
for r| > r^. On the other hand, for C = C2 + C3 

„2 3J/2 

p®"(C Al /2 > n) < exp{-n2(C)- 4 (nri2- 3J/2 ) 4 }I{nTi2- 3J/2 < C} 

n 

And obviously, on the range we are considering n(nr]2~ 3 J/ 2 ) 4 > n{r\2^^ 2 ) 2 V logn. 
Now for the last term, (|| Y.]=) 2I k |Bjk|[l4>jk(G N n (G~ 1 )) — "4>jk]]||dx), considering again the 

U( t )'s and putting U (0 ) = 0, U (n+1) = 1, we have, on [U (i) , U (i+1) ], 6(G _1 (x)) = ^. For any 
arbitrary a > 0, we have 

J I 

J f 

< 2 (W)a^- 2 -ja [^IB^I^GJG- 1 ))-^] 2 

j=2 J ic 

< 2 'W)^2-i a f [ U<1+,l [^|B jk ||^ k (^)-^ k (x)|] 2 dx 



U(i) 



H i=0 

Now, we will distinguish two cases : either - G [U^ — ^,U( i+1 ) + |f] (case I) or not (case 

II, which implies that A n 2' > N). 

In case I, if we denote by A ni = sup{|^ — U^l, |- — U( i+1 )|}, and Ij k is the support of ihj k 

,as il> is continuously differentiable, we get, for x G [1%), : 

[L k |B*IKM^)-xMx)]] 2 < [^ 1 clB j1 cl||^'||oo2 3 ^ 2 A^ 1 I IjIc (x)] 2 < N ^^IB^PII-lP'H^Z^A^I^^x) 
The last inequality is true because only a finite number of Ii jk (x)'s are not zero at the 

same time. 

If we now remark that in case I, A nj i < 2N2~' A A n we get, for x G [U(i), U( i+1 )] : 
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[LJBjiJNMi) -« 2 W < 2N 2 £ k |B jlc | 2 ||ib'|| 00 2^A n I I .Jx). 

In case II, we get, for x 6 [U(t)> U(i+i)], using again the fact that only a finite number 
of ij'jk's are not zero at the same time : 

^JBjtcINM-) -^jk(x)|] 2 < 2 I [^|B jk ||^ k (^)|] 2 + [^|B jk ||^M]] 2 ll{A n 2' > N} 

k n I k n k J 



< 2 



sup B 2 k +[>" IBjkH^-kM 



I{A n 2> > N} 



Putting the two cases together, we deduce : 



J J 
Y_ Lk |B j k|[|^k(6 n (G- 1 ))-^k|]|| 2 dx <c2 jQ ^2-^^ 

J 



u, 



^|B jk ll^k(^)-^k(x)|] 2 dx 



< c2 Jq ^2 Hq ^ 

3=3 i=0 



N 2 ^|B j k| 2 ||^'||oo2 2 'A T 



+ 



1) sup B 2 c +[^|B j kll^ j kW|] ; 



3'<J,lc 



I{A n 2> > N} > dx 



< c2 jQ ^2-^iN 2 ^|B j k| 2 ||^'||oo2 j A r 



+ 



< c 



3=3 



2 

oo 

j<J, k 

J 



V sup B^ + ^JB^N^A^ 



I{A n 2> > N} 



Y" Y" |B jk | 2 A n + T sup B 2 k I{A n 2* > N} 

3<J,k 



3=3 k 



A + B 



To study the first term, again using lemma and (jUj), we get 

J 

p® n (A > ti 2 /3) < pn^^lBjkl^^ct! 2 ) 

3=3 k 

J 

< P® n lY_ Y_ l B 3K| 221 > ^ + P^( A n > CT! 2 /t 2 

3=3 k 

t 4 c 2 2ri 4 

< C exp -C [TLp22J V log n ] + K ex P ~ n t 4 

for t 2 2~* > ct^/ 2 : Optimizing in t, we find, for t 4 = cr\ 2 2^], 

p 0n (A > ri 2 /3) < exp -rn] 2 !^]^ 
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This is valid if t 2 2~ J > ctn /2 i.e. r\2- ]/2 > cn" 1/2 . 
Now taking t = mj, we find 

p® n (A > ri 2 /3) < exp-[dlogn] 

using again the fact that s > \ an d "H > Dr| n . 

On the other hand, we have also the following bound using Bernstein inequality (see (|3Uj) : 

J 

P® n iY_ Y. |B i k|22 ' - ^ - 2l P^(l B jkl 2 2 21 > t 2 ) < 2 T exp -nct 2 2- 2J (33) 
H k 

For t2~ I/2 < c'. If then again, we optimize in t, we find : t 2 = r| 4 / 3 2 2 ^ 3 leading to the rate : 
exp — nr) 4 / 3 2 _4 ^ 3 We have r| 4 / 3 2 -4 ^ 3 > r\ 2 2^ for r\ < 2~^ 2 . In this case, we precisely have 

t 22-J/2 _ ^4/3221/32-1/2 < 2~l/2_ 

It is obvious that the second term (B ) may be bounded (using (|27))) by 

p® n (A n 2> > N) < Kexp-2nN 2 2- 21 < exp-2N logn 
Now, we have, using (j3"Uj) 

p 0n ( sup B 2 k 2 T > cV) < cexpcnri^- 1 

2<)<I>k 

if tj < c" . Notice that the constant c" may be chosen arbitrarily. Of course this choice will 
change the constant c. Hence, let us take c" = MN, and now, let us remark that, 

2'B 2 k < 2i[l2ZiM2 J / 2 I{i e [|,^]}] 2 < 2'[^M2 J / 2 ^] 2 < M 2 N 2 . Hence the probability 
for supj <;j< j k B? k 2 J to exceed r\ 2 is zero for r\ 2 > M 2 N 2 . 

This achieves bounding the term SS as well as ends up the proof of the theorem. 
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